home *** CD-ROM | disk | FTP | other *** search
- /************************************************************************/
- /* hack.doc */
- /************************************************************************/
-
- /************************************************************************/
- /* History */
- /* */
- /* 82Dec07 CrT Completed. */
- /* 82Nov23 CrT Created. */
- /************************************************************************/
-
- /************************************************************************/
- /* Audience */
- /* */
- /* Folks trying to read or modify the Citadel code. */
- /************************************************************************/
-
- /************************************************************************/
- /* Purpose */
- /* */
- /* Explain the basic data structures and algorithms. */
- /************************************************************************/
-
- /************************************************************************/
- /* Overview */
- /************************************************************************/
-
- The fundamental structure of the system is very simple. CtdlMsg.sys
- is a circular file. New messages are simply written around the buffer
- in an endless circle, overwriting old messages in their way. There is
- no other way of deleting messages, and text is never shuffled on disk.
- Messages are numbered consecutively and start with an FF (hex)
- byte. Except for this FF start-of-message byte, all bytes in the message
- file have the high bit set to 0. This means that in principle it is
- trivial to scan through the message file and locate message N if it
- exists, or return error. (Complexities, as usual, crop up when we
- try for efficiency...)
- Each room is basically just a list of message numbers. Each time
- we enter a new message in a room, we slide all the old message-numbers
- down a slot, and probably the oldest one falls off the bottom. Reading
- a room is just a matter looking up the messages one by one and printing
- them out. If the message has been overwritten already, we just ignore it.
- Implementing the "new message" function is also trivial in principle:
- we just keep track, for each caller in the userlog, of the highest-numbered
- message which existed on the >last< call. (Remember, message numbers are
- simply assigned sequentially each time a message is created. This
- sequence is global to the entire system, not local within a room.) If
- we ignore all message-numbers in the room less than this, only new messages
- will be printed. Voila! Code up the system described thus far, and
- you'll have a good approximation to Version 1. Better stop and reread
- everything to here, so you can pick out the fundamental mechanisms among
- all of Version 2's bells and whistles.
-
- /************************************************************************/
- /* message format on disk (ctdlMsg.sys) */
- /************************************************************************/
-
- Message format has changed relative to V1, in the direction of using
- more disk space and providing greater flexibility.
-
- A message now consists of a sequence of character strings. Each string
- begins with a type byte indicating the meaning of the string and is
- ended with a null. All strings are printable ASCII: in particular,
- all numbers are in ASCII rather than binary. This is for simplicity,
- both in implementing the system and in implementing other code to
- work with the system. For instance, a database driven off Citadel archives
- can do wildcard matching without worrying about unpacking binary data such
- as dates first. To provide later downward compatability,
- all software should be written to IGNORE fields not currently defined.
-
-
- /************************************************************************/
- /* The type bytes currently defined are: */
- /************************************************************************/
-
- BYTE Mnemonic Comments
-
- 0xFF Start-of-message indicator. Followed by local
- message ID# as ASCII string, null-terminated as
- always. This byte is the >only< byte which has
- the high bit set in a Citadel message.buf file.
- This field must be present in every message.
- A Author Name of originator of message.
- D Date Date message was created.
- M Message Text of message. Is last field in a message, by
- definition. Following data will be ignored.
- This field must be present in every message.
- N Name Human name for node originated on. Used on
- title line of foreign messages. Ex:
- ODD-DATA
- will produce a title message something like
- 82Nov23 from Cynbe ru Taren @ODD-DATA
- O Origin ID of node message originated on: Country code plus
- phone number of system. (Not stored for locally
- originated messages.) Ex:
- US 206 633 3282
- R Room Room of origin. Topic.
- S Source ID# Message ID # on system message was created on.
- Two 16-bit integers (high and low halves of
- full 32-bit ID#) separated by a blank. Ex:
- 0 13654
- T To Addressee. Used only for private messages in Mail>,
- in version 2.00 .
- EXAMPLE
-
- Let <FF> be a 0xFF byte, and <0> be a null (0x00) byte. Then a message
- which prints as
-
- LOGLAN> read new
-
- 82Nov04 From James Cooke Brown
- Loi uiue i Ti logla
-
- LOGLAN>
-
- might be stored as
-
- <FF>0 3583<0>D82Nov04<0>AJames Cooke Brown<0>RLOGLAN<0>MLoi uiue i Ti logla<0>
- |--Local ID--|---Date---|-----Author---------|--Room---|-------Message--------|
-
- The date, room and author fields could be in any order. Not all fields
- are printed by default. The local ID# and Room field are suppressed here.
- An isolated system will not normally have use for fields beyond those
- used in this example.
-
- Lines are marked with C NewLine (ASCII LF) characters, within the message
- text proper.
-
- /************************************************************************/
- /* Networking */
- /************************************************************************/
-
- Citadel nodes network by sharing one or more rooms. Any Citadel node
- can choose to keep an image of any public room on any other Citadel node
- (subject to willingness to foot the phone bills, of course!). The
- procedure in essence simply involves calling the imaged node up periodically
- and copying any new messages in the imaged room into the local image.
-
- There is no necessary reciprocity or pre-arrangement, although convenience
- and politeness respectively suggest both. The node which gets the
- information foots the phone bill for the transaction. This promotes
- simple and harmonious relations between the nodes.
-
- Complexities arise primarily from the possibility of densely connected
- networks: one does not wish to accumulate multiple copies of a given
- message, which can easily happen. Nor does one want to see old messages
- percolating indefinitely through the system.
-
- This problem is handled by a simple brute-force mechanism: each node
- keeps a list of all messages it has seen recently, recording origin
- system, creation date, and original ID#. When downloading, messages
- which have already been seen, or which are too old to be remembered,
- are skipped. Messages can percolate outward through a large network
- with no global routing or control, but do not reproduce wildly or
- cycle indefinitely.
-
-
- The above discussion should make the function of the new
- fields reasonably clear:
-
- o Every message needs a local ID#, so the system can determine if
- a given caller has seen it before.
- o Travelling messages need to carry system of origin, date of
- origin, and original ID# with them, to keep reproduction and
- cycling under control.
-
-
- (Uncoincidentally) the format used to transmit messages for networking
- purposes is precisely that used on disk, except that lines are marked
- with ASCII CR characters in stead of ASCII LF characters. The current
- distribution does not include active network support code; these comments
- are included in the hope of discouraging people from modifying the disk
- format, date field or whatever in ways which will make their local node
- incompatable with the network at large.
-
- /************************************************************************/
- /* portability problems */
- /************************************************************************/
-
- Modem code is covered elsewhere.
-
- If porting to a C compiler other than BDS C, various problems are likely
- where low-level I/O takes place. Look in particular at:
-
- bios() direct BIOS calls -- console access mostly.
- getc()
- getw()
- inp() input from Z80 port. Mostly modem-related.
- movmem() block transfer in RAM
- outp() output to Z80 port. Mostly modem-related.
- putch() console output
- rsrec() set random-access mode for file
- rread() random file read by sectors
- rwrite() random file write by sectors
- _spr() mprintf(), dprintf() use this printf() support routine.
-
- /************************************************************************/
- /* "Room" records (ctdlRoom.sys) */
- /************************************************************************/
- The rooms are basically indices into ctdlMsg.sys, the message file.
- As noted in the overview, each is essentially an array of pointers into
- the message file. The pointers consist of a 16-bit message ID number
- (we will wrap around at 64K for these purposes) together with a 16-bit
- sector offset within ctdlMsg.sys telling us where the message begins.
-
- Since messages are number sequentially and written circularly, the
- set of messages existing in ctdlMsg.sys will always form a continuous
- sequence at any given time. Thus, by remembering the ID numbers of the
- oldest and newest messages in the message file, we can check to see
- if a message exists >before< going to disk, saving ourselves (and the
- disk drive) the pain of futile seeks in search of the nonexistent.
- This information is recorded in oldestHi/oldestLo (16-bit halves of
- 32-bit numbers, of which we use the low half here) and
- newestHi/newestLo. You'll be seeing more of these...
-
- The newestHi/newestLo is simply incremented each time we enter a
- new message in the message files. OldestHi/oldestLo is incremented
- each time we overwrite an FF (start-of-message) byte in the course
- of writing a new message into the files. This corresponds to dead
- reckoning -- current code never checks to see that the message number
- of the message we are overwriting is what we think it is. In a garbaged
- file with extra FF bytes around, this could cause oldestHi/oldestLo to
- count too rapidly, eventually perhaps overtaking newestHi/newestLo,
- at which time the system will look completely empty. If you suspect
- something like this is going on, just use configur.com to rebuild
- accurate numbers.
-
- That should be enough background to tackle a full-scale room. From
- 200ctdl.h :
-
- struct {
- char rbgen; /* generation number of room */
- char rbflags; /* same bits as flags above */
- char rbname[NAMESIZE]; /* name of this room */
- char rbdisk; /* disk this room's files are in */
- char rbuser; /* user area for this room's files */
- union {
- struct {
- unsigned rbmsgNo; /* every message gets unique# */
- int rbmsgLoc; /* sector message starts in */
- } msg[MSGSPERRM];
- /* we'll ignore the other component here -- not used currently */
- } vp;
- char rbOverRun[2]; /* actually unneeded just now */
- } roomBuf;
-
- [Note that all components start with "rb" for roomBuf, to make sure we
- don't accidentally use an offset in the wrong structure. Be very careful
- also to get a meaningful sequence of components --
- roomBuf.vp.msg[i].rbmsgNo;
- not
- roomBuf.msg[i].rbmsgNo;
- and certainly not
- roomBuf.rbmsgNo;
- BDS C provides no checking on this sort of stuff either. (Fast
- compilation on an 8080 has its prices!)]
-
- Rbgen handles the problem of rooms which have died and been reborn
- under another name. This will be clearer when we get to the log file.
- For now, just note that each room has a generation number which is
- bumped by one each time it is recycled.
-
- Rbflags is just a bag of bits recording the status of the room. The
- defined bits are:
- Bit 0: INUSE. 1 if the room is valid, 0 if it is free for re-assignment.
- Bit 1: PUBLIC. 1 if the room is visible by default, else 0.
- Bit 2: CPMDIR. 1 if the room is a window onto some disk/userspace, else 0.
- Bit 3: PERMROOM. 1 if the room should not be recycled even if empty, else 0.
- (Lobby>, Mail> and all CPMDIRs are automatically permanent.)
-
- Rbname is just an ASCII string (null-terminated, like all strings)
- giving the name of the room.
-
- Rbdisk and rbuser are meaningful only in CPMDIR rooms, in which case
- they give the disk (0 == A:, 1==B: ... 15==P:) and userspace (0..31)
- to window.
-
- Finally, msg is the array of pointers into the message file. RbmsgNo
- is the ID number of the message, and RbmsgLoc is the sector it begins
- in. (For NIL, we stick the value -1 in RbmsgLoc.)
-
- RbOverRun[] at the tail end is a little kludge. We read disk sectors
- directly into roomBuf, so it has to be two sectors long or something
- following will be overwritten. If you resize msg[], be sure to
- recalculate the length of roomBuf and resize rbOverRun[] accordingly.
-
-
- RoomTab is just a little index into ctdlRoom.sys, to keep us from
- constantly pounding around on the disk looking for things. It basically
- records the name and status of each room. It is 100% redundant with
- the file itself... as all our indices are. (As all indices should be!)
- Note that RoomTab is a significant consumer of RAM all by itself. It
- is RAM well spent, but if you have to shave Citadel a few K to make
- it fit your system, cutting the number of rooms a bit is one try.
-
- The only field new to us in roomTab is rtlastMessage, recording the
- most recent message in the room. When we are searching for rooms with
- messages a given caller hasn't seen, we can check this number in RAM
- and avoid unprofitable disk accesses.
-
- /************************************************************************/
- /* log records (ctdlLog.sys) */
- /************************************************************************/
- This is the fun one. Get some fresh air and plug in your thinking cap
- first. (Time, space and complexity are the eternal software rivals.
- We've got 256 log entries x about 500 messages spread over up to 128
- rooms to worry about, and with floppies disk access time is important...
- so perforce, we opt for lots of complexity to keep time and space in bounds.)
-
- To understand what is happening in the log code takes a little persistence.
- You also have to disentangle the different activities going on and
- tackle them one by one.
-
- o We want to remember some random things such as terminal screen
- size, and automatically set them up for each caller at login.
-
- o We want to be able to locate all new messages, and only new
- messages, efficiently. Messages should stay new even if it
- takes a caller a couple of calls to get around to them.
-
- o We want to remember which private rooms a given caller knows
- about, and treat them as normal rooms. This means mostly
- automatically seeking out those with new messages. (Obviously,
- we >don't< want to do this for unknown private rooms!) This
- has to be secure against the periodic recycling of rooms
- between calls.
-
- o We want to support private mail to a caller.
-
- o We want to provide some protection of this information (via
- passwords at login) and some assurance that messages are from
- who they purport to be from (within the system -- one shouldn't
- be able to forge messages from established users).
-
- Lifting another page from 200ctdl.h gives us:
-
- struct logBuffer {
- char lbnulls; /* # nulls to print after newline */
- char lbflags; /* UCMASK, LFMASK, EXPERT */
- char lbwidth; /* terminal width */
- char lbname[NAMESIZE]; /* caller's name */
- char lbpw[NAMESIZE]; /* caller's password */
- char lbgen[MAXROOMS]; /* 5 bits gen, 3 bits lastVisit */
- char lbvisit[MAXVISIT]; /* newestLo on last few calls */
- char lbslot[MAILSLOTS]; /* for private mail */
- char lbId[MAILSLOTS]; /* for private mail */
-
- char lbOverFlow[] /* */
- }
-
- Looks simple enough, doesn't it? One topic at a time:
-
- RANDOM CONFIGURATION PARAMETERS
-
- These are in the first three bytes in the record. Lbnulls gives the
- number of nulls to print after a newline, for folks who like
- simultaneous hardcopy. Or any remaining ASR33 aficionados calling up...
- Lbwidth is the caller's screen width. We format all messages to this
- width, as best we can. Lbflags is another bit-bag, recording
- uppercase-only folks, people who need a linefeed after a carraige-return,
- and people who want to suppress the little automatic hints all through
- the system.
-
-
- FINDING NEW MESSAGES
-
- This is the most important. Thus, it winds up being the most
- elaborate. Conceptually, what we would like to do is mark each
- message with a bit after our caller has read it, so we can avoid
- printing it out again next call. Unfortunately, with 256 log
- entries this would require adding two sectors to each message... and
- we'd wind up reading off disk lots of messages which would never
- get printed. So we resort to an arcane mixture of approximation
- and low animal cunning.
-
- The approximation comes in doing things at the granularity of
- rooms rather than messages. Messages in a given room are "new"
- until we visit it, and "old" after we leave the room... whether
- we read any of them or not. This can actually be defended: anyone
- who passes through a room without reading the contents probably just
- isn't interested in the topic, and would just as soon not be dragged
- back every visit and forced to read them. Given that messages are
- numbered sequentially, we can simply record the most recent message ID#
- of each room as of the last time we visited it. With 128 rooms, this
- would give us (for each user) an array of 128 integers, or 256 bytes.
-
- This is still too much -- I'd like the complete log record for a user
- to be 256 bytes or less, and we have other stuff to do yet.
-
- So, we complicate a little more. We record in lbvisit[MAXVISIT] the
- most recent message ID# in the system on each of the last six calls
- or so. Now, for each room, we can just indicate which call we last
- visited the room on. This takes 3 bits per room, which we stash in
- the low three bits of lbgen[MAXROOMS]. Now we're down to
- 128 rooms x 3 bits (plus a few bytes in lbvisit[], of course),
- which is quite reasonable.
-
- Putting it all together, we can now compute whether a given room
- has new messages for our current caller without going to disk at all:
- > We get the lbgen[] entry for the room in question
- > We mask off the lower 3 bits
- > We use the result as an index into lbvisit[], getting the ID number
- of the most recent message in the system as of the last time we
- visited the room.
- > We compare this with roomTab[].rtlastMessage, which tells us
- what the most recent message in the room is currently.
-
-
- REMEMBERING WHICH PRIVATE ROOMS TO VISIT
-
- This looks trivial at first glance -- just record one bit per room per
- caller in the log records. The problem is that rooms get recycled
- periodically, and we'd rather not run through 256 log entries each
- time we do it. So we adopt a kludge which should work 99% of the time.
-
- As previously noted, each room has a generation number, which is bumped
- by one each time it is recycled. As not noted, this generation number
- runs from 0 -> 31 (and then wraps around and starts over). Thus, these
- numbers take 5 bits to represent. By a miraculous coincidence, we have
- exactly 5 bits left in the lbgen[] entries in the log records. [Anyone
- familiar with "capability" pointers will be encountering deja vu here...]
-
- When someone visits a room, we set the generation number in lbgen[]
- equal to that of the room. This flags the room as being available.
- If the room gets recycled, on our next visit the two generation numbers
- will no longer match, and the room will no longer be available -- just
- the result we're looking for. (Naturally, if a room is marked as PUBLIC,
- all this stuff is irrelevant.)
-
- This leaves only the problem of an accidental matchup between the two
- numbers giving someone access to a Forbidden Room. We can't eliminate
- this danger completely, but it can be reduced to insignificance for
- most purposes. (Just don't bet megabucks on the security of this system!)
- Each time someone logs in, we set all "wrong" generation numbers to be
- one less than the actual generation of the room. This means that the
- room must be recycled thirty-one times before an accidental matchup
- can be achieved. (We do this for all rooms, INUSE or dead, public
- or private, since any of them may be reincarnated as a Forbidden Room.)
-
- Thus, for someone to accidentally be lead to a Forbidden Room, they
- must establish an account on the system, then not call until some room
- has been recycle thirty-one to thirty-two times, which room must be
- reincarnated as a Forbidden Room, which someone must now call back
- (having not scrolled off the userlog in the mean time) and read new
- messages. The last clause is about the only probable one in the sequence.
- The danger of this is much less than the danger that someone will
- simply guess the name of the room outright...
-
-
- SUPPORTING PRIVATE MAIL
-
- Can one have an elegant kludge? This must come pretty close.
-
- Private mail is sent and recieved in the Mail> room, which otherwise
- behaves pretty much as any other room. To make this work, we store
- the actual message pointers in lbslot[] and lbId[] in the caller's
- log record, and then copy them into the Mail> room array whenever we
- enter the room. This requires a little fiddling to get things just
- right. We have to update roomTab[MAILROOM].rtlastMessage at login
- to reflect the presence or absence of new messages, for example. And
- MakeMessage() has to be kludged to ask for the name of the recipient
- of the message whenever a message is entered in Mail>. But basically
- it works pretty well, keeping the code and user interface simple and
- regular.
-
-
- PASSWORDS AND NAME VALIDATION
-
- LogTab[] indexes ctdlLog.sys, giving us a quick way of finding people.
- It is basically a chronologically sorted hash table. We keep a two-byte
- hash of the name and password of each caller in RAM. When someone tries
- to log in, we just whip through the table in order looking for matches
- on the password hash and loading the matching logfile entry in. Bogus
- hits are eliminated by the simple expedient of refusing to acknowledge
- a new user who's name or password hashes on top of an existing user.
- Computer chauvinism at it's best...
- This makes it difficult to forge messages from an existing user. (Fine
- point: nonprinting characters are converted to printing characters, and
- leading, trailing, and double blanks are deleted.)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-